Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Incorrect expression naming for struct get #1832

Merged
merged 2 commits into from
Feb 7, 2024

Conversation

kevinzwang
Copy link
Member

@kevinzwang kevinzwang commented Feb 2, 2024

The reason why select from #1810 wasn't working was because there was a discrepancy between the name element in the Field struct of the struct get expression and the name() method of the expression.

This PR is a quick fix to that behavior for that specific expression, but ideally we would want to consolidate the logic between Expr::to_field and Expr::name so that there aren't two code paths for getting the name of an expression. The problem behind that is Expr::to_field takes in a schema, which is not always easily obtainable during the calls we make in the code to Expr::name. Notably, Python code in the physical plan and execution step generally do not use the schema, but do use expression names. Thus always requiring a schema for Expr::name would not work.

This raises the question: would a schema ever affect the name of an expression? The conclusion from a conversation with @clarkzinzow and @jaychia was no. Thus, would it be reasonable to separate the name from an expression's field, so that Expr::name was the single source of truth?

Moreover, in the future we would like to do table-scoped columns so that we are able to construct expressions that take data from multiple tables. Once we do that, it should be possible to derive the dtype of an expression by determining the dtypes of the columns or literals which are leaves in the expression tree. Then, we wouldn't even need to pass around a schema altogether?

Anyway, lots to think about. We'll leave this as tech debt that should be reworked when implement table-scoped columns, since that will likely require rethinking a lot about schemas and expressions as well.

@kevinzwang kevinzwang linked an issue Feb 2, 2024 that may be closed by this pull request
@github-actions github-actions bot added the bug Something isn't working label Feb 2, 2024
@kevinzwang kevinzwang marked this pull request as ready for review February 2, 2024 00:15
Copy link

codecov bot commented Feb 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (caa7bab) 85.62% compared to head (7e4578b) 85.54%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1832      +/-   ##
==========================================
- Coverage   85.62%   85.54%   -0.09%     
==========================================
  Files          55       55              
  Lines        6102     6102              
==========================================
- Hits         5225     5220       -5     
- Misses        877      882       +5     

see 4 files with indirect coverage changes

Copy link
Member

@samster25 samster25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@samster25
Copy link
Member

@kevinzwang can you also make a ticket noting the tech debt and what the suggested solution is?

@kevinzwang kevinzwang merged commit fbc026a into main Feb 7, 2024
42 checks passed
@kevinzwang kevinzwang deleted the kevin/struct-getter-name branch February 7, 2024 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Cannot select 2 fields at once from a df using struct accessor
3 participants